Figure 1: Summary Ouput - Initial Model

## 
## Call:
## lm(formula = cmRate ~ region + pctPrivateHC + pctEmployerHC + 
##     pctPublicHC, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -75.968 -12.971   0.096  12.430 116.450 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     150.4018    21.6748   6.939 1.16e-11 ***
## regionNortheast  -3.9740     3.6837  -1.079 0.281171    
## regionSoutheast  10.7338     2.8554   3.759 0.000189 ***
## regionSouthwest  -7.6100     4.3959  -1.731 0.084008 .  
## regionWest      -18.9862     3.7267  -5.095 4.87e-07 ***
## pctPrivateHC     -0.7575     0.2080  -3.642 0.000297 ***
## pctEmployerHC     0.8252     0.2220   3.717 0.000223 ***
## pctPublicHC       1.1768     0.2567   4.584 5.69e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.08 on 529 degrees of freedom
## Multiple R-squared:  0.2955, Adjusted R-squared:  0.2861 
## F-statistic: 31.69 on 7 and 529 DF,  p-value: < 2.2e-16

Figure 2: Residuals Plot - Initial Model


Figure 3: QQ Plot of Residuals - Initial Model


Figure 4: Scatterplot Matrix of Data


Check for Multicollinearity and Overfitting

##  pctPrivateHC pctEmployerHC   pctPublicHC 
##      3.306927      4.060143      3.002154
## 
##   Midwest Northeast Southeast Southwest      West 
##       178        53       199        51        56

Figure 5: Histogram of pctPublicHC (with and without a log transformation)


Figure 5.1: Residuals of Log Transformed Reponse vs Public


Figure 5.2: Histogram of pctPrivateHC (with and without a log transformation)


Figure 5.3: Residuals of Log Transformed Reponse vs Private


Figure 6: Second Model - Removing pctEmployerHC variable to address multicollinearity

## 
## Call:
## lm(formula = cmRate ~ region + pctPrivateHC + pctPublicHC, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.157 -13.675   0.516  13.226 113.151 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     181.7442    20.2077   8.994  < 2e-16 ***
## regionNortheast  -1.3370     3.6583  -0.365 0.714906    
## regionSoutheast  10.7951     2.8896   3.736 0.000207 ***
## regionSouthwest  -9.6164     4.4151  -2.178 0.029841 *  
## regionWest      -20.9214     3.7345  -5.602 3.41e-08 ***
## pctPrivateHC     -0.4339     0.1911  -2.270 0.023626 *  
## pctPublicHC       0.6852     0.2227   3.077 0.002198 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.36 on 530 degrees of freedom
## Multiple R-squared:  0.2771, Adjusted R-squared:  0.2689 
## F-statistic: 33.85 on 6 and 530 DF,  p-value: < 2.2e-16

Checking for Multicollinearity for Second Linear Model

## pctPrivateHC  pctPublicHC 
##      2.27376      2.27376

Figure 7: Residuals and QQ Plot for Second Linear Model

***

Figure 8: Exploring Additional Variables


Figure 9: Scatterplot Matrix of Data from Additional Variable Exploration with log transformations to Average Household Size and Median Income


Figure 10: Residuals and QQ Plot of Third Linear Model with inclusion of Median Income variable

## 
## Call:
## lm(formula = cmRate ~ region + pctPrivateHC + pctPublicHC + log(medianIncome), 
##     data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.772 -13.206   0.423  13.530 110.317 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       357.2177    92.9614   3.843 0.000136 ***
## regionNortheast     1.1095     3.8620   0.287 0.774005    
## regionSoutheast    10.9771     2.8837   3.807 0.000157 ***
## regionSouthwest    -8.1766     4.4663  -1.831 0.067701 .  
## regionWest        -19.0671     3.8464  -4.957 9.65e-07 ***
## pctPrivateHC       -0.2557     0.2118  -1.207 0.227814    
## pctPublicHC         0.4747     0.2473   1.919 0.055480 .  
## log(medianIncome) -16.7770     8.6766  -1.934 0.053697 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.3 on 529 degrees of freedom
## Multiple R-squared:  0.2821, Adjusted R-squared:  0.2726 
## F-statistic:  29.7 on 7 and 529 DF,  p-value: < 2.2e-16


Check for Multicollinearity

vif(lm(cmRate ~ pctPrivateHC + pctPublicHC + log(medianIncome), data = mortalityRates))
##      pctPrivateHC       pctPublicHC log(medianIncome) 
##          2.947409          2.935322          3.334527

Figure 11: Testing for Interactions - Scatterplot of pctPrivateHC vs cmRate by Region

***

Figure 12: Model with Interaction for Plot

## 
## Call:
## lm(formula = cmRate ~ pctPrivateHC * region, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.646 -12.910   1.238  13.341 108.228 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  273.8904    17.9985  15.217  < 2e-16 ***
## pctPrivateHC                  -1.3981     0.2514  -5.562 4.25e-08 ***
## regionNortheast               -9.5693    44.5894  -0.215  0.83015    
## regionSoutheast              -31.9221    20.5643  -1.552  0.12119    
## regionSouthwest              -87.3600    27.9584  -3.125  0.00188 ** 
## regionWest                   -78.8535    32.2949  -2.442  0.01495 *  
## pctPrivateHC:regionNortheast   0.1197     0.6155   0.194  0.84587    
## pctPrivateHC:regionSoutheast   0.5791     0.3006   1.927  0.05456 .  
## pctPrivateHC:regionSouthwest   1.1228     0.4405   2.549  0.01110 *  
## pctPrivateHC:regionWest        0.8193     0.4866   1.684  0.09283 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.45 on 527 degrees of freedom
## Multiple R-squared:  0.2754, Adjusted R-squared:  0.263 
## F-statistic: 22.26 on 9 and 527 DF,  p-value: < 2.2e-16

Figure 13: Fourth Linear Model Comparison without and with Interaction

## 
## Call:
## lm(formula = cmRate ~ log(medianIncome) + pctPublicHC + pctPrivateHC + 
##     region, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.772 -13.206   0.423  13.530 110.317 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       357.2177    92.9614   3.843 0.000136 ***
## log(medianIncome) -16.7770     8.6766  -1.934 0.053697 .  
## pctPublicHC         0.4747     0.2473   1.919 0.055480 .  
## pctPrivateHC       -0.2557     0.2118  -1.207 0.227814    
## regionNortheast     1.1095     3.8620   0.287 0.774005    
## regionSoutheast    10.9771     2.8837   3.807 0.000157 ***
## regionSouthwest    -8.1766     4.4663  -1.831 0.067701 .  
## regionWest        -19.0671     3.8464  -4.957 9.65e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.3 on 529 degrees of freedom
## Multiple R-squared:  0.2821, Adjusted R-squared:  0.2726 
## F-statistic:  29.7 on 7 and 529 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cmRate ~ log(medianIncome) + pctPublicHC + pctPrivateHC * 
##     region, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -81.083 -13.557   1.003  12.978 105.507 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  399.1541    97.4688   4.095 4.88e-05 ***
## log(medianIncome)            -17.0372     8.9014  -1.914  0.05616 .  
## pctPublicHC                    0.5051     0.2514   2.009  0.04502 *  
## pctPrivateHC                  -0.8194     0.2929  -2.797  0.00534 ** 
## regionNortheast              -37.4411    44.7276  -0.837  0.40292    
## regionSoutheast              -36.3780    20.4196  -1.782  0.07541 .  
## regionSouthwest              -82.6308    27.6938  -2.984  0.00298 ** 
## regionWest                   -80.6499    32.5117  -2.481  0.01343 *  
## pctPrivateHC:regionNortheast   0.5424     0.6196   0.875  0.38180    
## pctPrivateHC:regionSoutheast   0.6814     0.2990   2.279  0.02307 *  
## pctPrivateHC:regionSouthwest   1.1500     0.4358   2.639  0.00857 ** 
## pctPrivateHC:regionWest        0.8977     0.4895   1.834  0.06723 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.19 on 525 degrees of freedom
## Multiple R-squared:  0.2941, Adjusted R-squared:  0.2794 
## F-statistic: 19.89 on 11 and 525 DF,  p-value: < 2.2e-16

Figure 14: Residuals and QQ Plot of Fourth Linear Model with Interaction

***

Checking for Multicollinearity of Fourth Model

## log(medianIncome)       pctPublicHC      pctPrivateHC 
##          3.334527          2.935322          2.947409

Figure 15: Fifth Linear Model Comparison with no Region

## 
## Call:
## lm(formula = cmRate ~ pctPrivateHC + pctPublicHC + log(medianIncome), 
##     data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -89.154 -14.268   1.397  15.653 104.967 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       539.5016    91.1376   5.920 5.78e-09 ***
## pctPrivateHC       -0.1526     0.1803  -0.846    0.398    
## pctPublicHC         0.3607     0.2438   1.479    0.140    
## log(medianIncome) -33.8663     8.4751  -3.996 7.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.89 on 533 degrees of freedom
## Multiple R-squared:  0.1747, Adjusted R-squared:   0.17 
## F-statistic:  37.6 on 3 and 533 DF,  p-value: < 2.2e-16

Figure 16: Final Model - CMF

\(E(log(Y_{cmRate})|X) = \beta_0 + \beta_1 log(X_{medianIncome}) + \beta_2 X_{pctPublicHC} + \beta_3 X_{pctPrivateHC} + \beta_4 I_{NE} + \beta_5 I_{SE} + \beta_6 I_{SW} + \beta_7 I_{W}\) \(+ \beta_8X_{pctPrivHC} I_{NE} + \beta_{9}X_{pctPrivHC} I_{SE} + \beta_{10}X_{pctPrivHC} I_{SW} + \beta_{11}X_{pctPrivHC} I_{W}\)


Figure 17: Final Model

## 
## Call:
## lm(formula = cmRate ~ log(medianIncome) + pctPublicHC + pctPrivateHC * 
##     region, data = mortalityRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -81.083 -13.557   1.003  12.978 105.507 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  399.1541    97.4688   4.095 4.88e-05 ***
## log(medianIncome)            -17.0372     8.9014  -1.914  0.05616 .  
## pctPublicHC                    0.5051     0.2514   2.009  0.04502 *  
## pctPrivateHC                  -0.8194     0.2929  -2.797  0.00534 ** 
## regionNortheast              -37.4411    44.7276  -0.837  0.40292    
## regionSoutheast              -36.3780    20.4196  -1.782  0.07541 .  
## regionSouthwest              -82.6308    27.6938  -2.984  0.00298 ** 
## regionWest                   -80.6499    32.5117  -2.481  0.01343 *  
## pctPrivateHC:regionNortheast   0.5424     0.6196   0.875  0.38180    
## pctPrivateHC:regionSoutheast   0.6814     0.2990   2.279  0.02307 *  
## pctPrivateHC:regionSouthwest   1.1500     0.4358   2.639  0.00857 ** 
## pctPrivateHC:regionWest        0.8977     0.4895   1.834  0.06723 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.19 on 525 degrees of freedom
## Multiple R-squared:  0.2941, Adjusted R-squared:  0.2794 
## F-statistic: 19.89 on 11 and 525 DF,  p-value: < 2.2e-16

##      pctPrivateHC       pctPublicHC log(medianIncome) 
##          2.947409          2.935322          3.334527